Using Linguistic Graph Similarity to Search for Sentences in News Articles

نویسندگان

  • Kim Schouten
  • Flavius Frasincar
چکیده

With the volume of daily news growing to sizes too big to handle for any individual human, there is a clear need for effective search algorithms. Since traditional bag-of-words approaches are inherently limited since they ignore much of the information that is embedded in the structure of the text, we propose a linguistic approach to search called Destiny in this paper. With Destiny, sentences, both from news items and the user queries, are represented as graphs where the nodes represent the words in the sentence and the edges represent the grammatical relations between the words. The proposed algorithm is evaluated against a TF-IDF baseline using a custom corpus of user-rated sentences. Destiny significantly outperforms TF-IDF in terms of Mean Average Precision, normalized Discounted Cumulative Gain, and Spearman’s Rho.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A News Summarization System using Fuzzy Graph Based Document Model

This paper describes a news summarization system using the Fuzzy Graph based Document Model. News articles are modelled as fuzzy graphs whose nodes are sentences and edges are weighted by the fuzzy similarity measure between the sentences. The similarity between sentences is in between 0 and 1. Centrality of the graph retrieves important sentences. The proposed system produces summaries by Eige...

متن کامل

A Linguistic Graph-Based Approach for Web News Sentence Searching

With an ever increasing amount of news being published every day, being able to effectively search these vast amounts of information is of primary interest to many Web ventures. As word-based approaches have their limits in that they ignore a lot of the information in texts, we present Destiny, a linguistic approach where news item sentences are represented as a graph featuring disambiguated wo...

متن کامل

مقایسه روش‌های مختلف یادگیری ماشین در خلاصه‌سازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت

In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...

متن کامل

ارائه سیستم خلاصه ساز متون فارسی برمبنای ویژگی های زبان شناختی و رگرسیون

Considering the vast amount of existing written information and the shortage of time, optimal summarization of books, articles, news reports, etc. on the Web is a major concern of researchers. In this paper, we propose a new approach for Persian single-document Summarization based on several linguistic features of text. In our approach after extracting the linguistic features for each sentence,...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016